Use carryless multiply for Compressor forwarding metadata#1441
Use carryless multiply for Compressor forwarding metadata#1441no-defun-allowed wants to merge 17 commits intommtk:masterfrom
Conversation
And adjust the region and work packet sizes.
|
Some issues come to mind.
|
|
Plotty and here are the geomean results for clmul-enabled relative to clmul-disabled:
|
To clarify, do both builds include the change for computing multiple regions in one |
I tested this PR against upstream; Plotty and a tally of changes to stop-the-world times:
Some benchmarks see large improvements due to the better work balancing, some see small regressions due to worse locality with smaller region sizes. (I only found the mutator time to be consistently worse on tradesoap, which also has the worst STW regression at 5.4% slower.) |
|
I also inadvertently wrote a For now I'll make |
We could have just done I think what we have done is just a workaround for the unstable Rust API. Maybe we just manually implement |
Yeah, I don't like this approach of breaking a range into words and bytes; this (rather accurate to the paper) take on the Compressor always uses metadata which is word-aligned with |
This causes fewer headaches with the borrow checker.
Fixes an unused use warning on non-x86_64
|
The big wins are actually in being able to clmul-ise
No benchmarks regress either, which is neat. |


This PR introduces an algorithm for computing the offset vector in the Compressor which uses the carryless multiply instruction, based on the branch-free and bit-parallel algorithm in https://branchfree.org/2019/03/06/code-fragment-finding-quote-pairs-with-carry-less-multiply-pclmulqdq/